Skip to content

Conversation

@alpe
Copy link
Contributor

@alpe alpe commented Jan 16, 2026

Replaces #2954

alpe added 30 commits November 12, 2025 15:16
* main:
  fix: remove duplicate error logging in light node shutdown (#2841)
  chore: fix incorrect function name in comment (#2840)
  chore: remove sequencer go.mod (#2837)
* main:
  build(deps): Bump the go_modules group across 2 directories with 3 updates (#2846)
  build(deps): Bump github.com/dvsekhvalnov/jose2go from 1.7.0 to 1.8.0 in /test/e2e (#2851)
  build(deps): Bump github.com/consensys/gnark-crypto from 0.18.0 to 0.18.1 in /test/e2e (#2844)
  build(deps): Bump github.com/cometbft/cometbft from 0.38.17 to 0.38.19 in /test/e2e (#2843)
  build(deps): Bump github.com/dvsekhvalnov/jose2go from 1.6.0 to 1.7.0 in /test/e2e (#2845)
(cherry picked from commit c44cd77e665f6d5d463295c6ed61c59a56d88db3)
* main:
  chore: reduce log noise (#2864)
  fix: sync service for non zero height starts with empty store (#2834)
  build(deps): Bump golang.org/x/crypto from 0.43.0 to 0.45.0 in /execution/evm (#2861)
  chore: minor improvement for docs (#2862)
* main:
  chore: bump da (#2866)
  chore: bump  core (#2865)
* main:
  chore: fix some comments (#2874)
  chore: bump node in evm-single (#2875)
  refactor(syncer,cache): use compare and swap loop and add comments (#2873)
  refactor: use state da height as well (#2872)
  refactor: retrieve highest da height in cache (#2870)
  chore: change from event count to start and end height (#2871)
* main:
  chore: remove extra github action yml file (#2882)
  fix(execution/evm): verify payload status (#2863)
  feat: fetch included da height from store (#2880)
  chore: better output on errors (#2879)
  refactor!: create da client and split cache interface (#2878)
  chore!: rename `evm-single` and `grpc-single` (#2839)
  build(deps): Bump golang.org/x/crypto from 0.42.0 to 0.45.0 in /tools/da-debug in the go_modules group across 1 directory (#2876)
  chore: parallel cache de/serialization (#2868)
  chore: bump blob size (#2877)
* main:
  build(deps): Bump mdast-util-to-hast from 13.2.0 to 13.2.1 in /docs in the npm_and_yarn group across 1 directory (#2900)
  refactor(block): centralize timeout in client (#2903)
  build(deps): Bump the all-go group across 2 directories with 3 updates (#2898)
  chore: bump default timeout (#2902)
  fix: revert default db (#2897)
  refactor: remove obsolete // +build tag (#2899)
  fix:da visualiser namespace  (#2895)
  refactor: omit unnecessary reassignment (#2892)
  build(deps): Bump the all-go group across 5 directories with 6 updates (#2881)
  chore: fix inconsistent method name in retryWithBackoffOnPayloadStatus comment (#2889)
  fix: ensure consistent network ID usage in P2P subscriber (#2884)
  build(deps): Bump golangci/golangci-lint-action from 9.0.0 to 9.1.0 (#2885)
  build(deps): Bump actions/checkout from 5 to 6 (#2886)
* main: (34 commits)
  feat: make reaper poll duration configurable (#2951)
  chore!: move sequencers to pkg (#2931)
  feat: Ensure Header integrity on DA (#2948)
  feat(testda): add header support with GetHeaderByHeight method (#2946)
  chore: improve code comments clarity (#2947)
  chore(sequencers): optimize store check (#2945)
  fix: make evm_execution more robust (#2942)
  fix(sequencers/single): deterministic queue (#2938)
  fix(block): fix init logic sequencer for da epoch fetching (#2926)
  feat: use DA timestamp (#2939)
  chore: improve code comments clarity (#2943)
  build(deps): bump libp2p (#2937)
  build(deps): Bump actions/cache from 4 to 5 (#2934)
  build(deps): Bump actions/download-artifact from 6 to 7 (#2933)
  build(deps): Bump actions/upload-artifact from 5 to 6 (#2932)
  feat: DA Client remove interface part 3, replace types with new code (#2910)
  DA Client remove interface: Part 2.5, create e2e test to validate that a blob is posted in DA layer. (#2920)
  fix(syncing): skip forced txs checks for p2p blocks (#2922)
  build(deps): Bump the all-go group across 5 directories with 5 updates (#2919)
  chore: loosen syncer state check (#2927)
  ...
alpe added 16 commits January 7, 2026 16:11
* main:
  feat(tracing): add tracing to EngineClient (#2959)
  chore: inject W3C headers into engine client and eth client (#2958)
  feat: adding tracing for Executor and added initial configuration (#2957)
* main:
  fix: Fail fast when executor ahead (#2966)
  feat(block): async epoch fetching (#2952)
  perf: tune badger defaults and add db bench (#2950)
* main:
  chore: adding syncing tracing (#2981)
  feat(tracing): adding block production tracing (#2980)
  feat(tracing): Add Store, P2P and Config tracing (#2972)
  chore: fix upgrade test (#2979)
  build(deps): Bump github.com/ethereum/go-ethereum from 1.16.7 to 1.16.8 in /execution/evm/test in the go_modules group across 1 directory (#2974)
  feat(tracing): adding tracing to DA client (#2968)
  chore: create onboarding skill  (#2971)
  test: add e2e tests for force inclusion (part 2) (#2970)
  feat(tracing): adding eth client tracing (#2960)
  test: add e2e tests for force inclusion (#2964)
  build(deps): Bump the all-go group across 4 directories with 10 updates (#2969)
* main:
  fix: inconsistent state detection and rollback (#2983)
  chore: improve graceful shutdown restarts (#2985)
  feat(submitting): add posting strategies (#2973)
@claude
Copy link
Contributor

claude bot commented Jan 16, 2026

Code Review: High Availability via RAFT

Starting comprehensive code review of the RAFT implementation.

Review Checklist:

  • Read repository guidelines (CLAUDE.md)
  • Review core RAFT implementation (pkg/raft/*)
  • Review state synchronization changes (executor, syncer)
  • Review node lifecycle and failover logic
  • Review configuration management
  • Review test coverage and E2E tests
  • Check error handling and resilience patterns
  • Check security considerations
  • Check concurrency safety
  • Provide final review feedback

View job run

@github-actions
Copy link
Contributor

github-actions bot commented Jan 16, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedJan 19, 2026, 9:53 AM

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @alpe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request fundamentally changes how Rollkit nodes handle high availability by integrating the RAFT consensus algorithm. The primary goal is to ensure continuous operation and data consistency across a cluster of sequencer nodes, even in the event of node failures. The changes involve a significant architectural shift to support dynamic role management (leader/follower) and robust state synchronization, allowing nodes to automatically recover and maintain a consistent view of the blockchain state. This feature is crucial for building resilient and fault-tolerant Rollkit deployments.

Highlights

  • RAFT Consensus Integration: Introduces the Raft consensus algorithm for high availability, enabling dynamic leader election and state replication among sequencer nodes using the hashicorp/raft library.
  • Dynamic Leader Election & Failover: Implements a DynamicLeaderElection mechanism that allows nodes to seamlessly transition between leader and follower roles. This includes logic for handling leader loss, ensuring FSM synchronization before leadership, and attempting recovery if a new leader is not synced.
  • Enhanced State Synchronization & Recovery: The Executor and Syncer components are updated to verify local state against the Raft replicated state. This includes automatic rollbacks if the execution layer is ahead, checks for block existence in the store before syncing, and robust recovery mechanisms from Raft state during failover scenarios.
  • Refactored Node Architecture: The node's P2P client creation and management are refactored, and a new failoverState structure is introduced to encapsulate components that need to be reset during mode switches (aggregator/sync). This simplifies the node's lifecycle management in a HA setup.
  • New RPC Endpoint for RAFT Status: Adds a /raft/node HTTP endpoint to query the Raft node's leadership status and ID, providing visibility into the consensus state.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-executed feature: high availability for the sequencer using the Raft consensus algorithm. The changes are extensive, refactoring core components to support dynamic leader election and failover between aggregator (leader) and sync (follower) roles. The new architecture appears robust, with strong consistency checks to prevent issues like state divergence or stale leaders. The addition of comprehensive end-to-end tests for failover scenarios is particularly commendable. My review includes a few suggestions to improve configuration handling and error messages, aiming to enhance the operator experience and debuggability of this critical new functionality.

Comment on lines +163 to 166
prevState, err = s.store.GetStateAtHeight(ctx, height-1)
if err != nil {
return fmt.Errorf("failed to get previous state: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better debuggability, it would be helpful to include the specific height that was being queried in the error message when getting the previous state fails.

Suggested change
prevState, err = s.store.GetStateAtHeight(ctx, height-1)
if err != nil {
return fmt.Errorf("failed to get previous state: %w", err)
}
prevState, err = s.store.GetStateAtHeight(ctx, height-1)
if err != nil {
return fmt.Errorf("failed to get previous state at height %d: %w", height-1, err)
}

Comment on lines +121 to 122
return nil, fmt.Errorf("raft config must be used in sequencer setup only")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message "raft config must be used in sequencer setup only" could be more explicit. It seems the design requires a node to be configured as a potential aggregator to participate in a Raft cluster. A clearer message would improve user experience when configuring nodes.

Suggested change
return nil, fmt.Errorf("raft config must be used in sequencer setup only")
}
return nil, fmt.Errorf("raft can only be enabled for aggregator nodes (aggregator flag must be true)")

Comment on lines +139 to +141
if svrs := deduplicateServers(cfg.Servers); len(svrs) != len(cfg.Servers) {
return fmt.Errorf("duplicate peers found in config: %v", cfg.Servers)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of returning an error for duplicate peers in the configuration, it would be more user-friendly to log a warning and proceed with the deduplicated list. The deduplicateServers function already handles the removal of duplicates, so failing fast might be overly strict for what is likely a minor configuration mistake.

Suggested change
if svrs := deduplicateServers(cfg.Servers); len(svrs) != len(cfg.Servers) {
return fmt.Errorf("duplicate peers found in config: %v", cfg.Servers)
}
if svrs := deduplicateServers(cfg.Servers); len(svrs) != len(cfg.Servers) {
n.logger.Warn().Msgf("duplicate peers found in config, using deduplicated list: %v", svrs)
cfg.Servers = svrs
}

@alpe alpe marked this pull request as draft January 18, 2026 10:38
@codecov
Copy link

codecov bot commented Jan 19, 2026

Codecov Report

❌ Patch coverage is 34.04067% with 746 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.30%. Comparing base (73297c1) to head (3b4fe3f).

Files with missing lines Patch % Lines
pkg/raft/node.go 11.05% 193 Missing ⚠️
pkg/raft/node_mock.go 45.40% 74 Missing and 21 partials ⚠️
block/internal/executing/executor.go 2.32% 77 Missing and 7 partials ⚠️
block/internal/syncing/syncer.go 0.00% 82 Missing and 1 partial ⚠️
block/internal/syncing/raft_retriever.go 0.00% 75 Missing ⚠️
node/full.go 32.30% 37 Missing and 7 partials ⚠️
pkg/raft/election.go 67.20% 29 Missing and 12 partials ⚠️
node/failover.go 73.38% 24 Missing and 13 partials ⚠️
tools/local-da/rpc.go 0.00% 21 Missing ⚠️
block/internal/common/replay.go 50.00% 14 Missing and 6 partials ⚠️
... and 14 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2987      +/-   ##
==========================================
- Coverage   58.98%   56.30%   -2.69%     
==========================================
  Files         103      109       +6     
  Lines        9902    10880     +978     
==========================================
+ Hits         5841     6126     +285     
- Misses       3434     4078     +644     
- Partials      627      676      +49     
Flag Coverage Δ
combined 56.30% <34.04%> (-2.69%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@alpe alpe marked this pull request as ready for review January 19, 2026 08:42
* main:
  feat(tracing): tracing part 9 sequencer (#2990)
  build(deps): use mainline go-header (#2988)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants